Discourse-level features for statistical machine translation
نویسنده
چکیده
The talk will show how the disambiguation of discourse connectives can improve their automatic translation. Connectives are a class of frequent functional lexical items that play an important role in text readability and coherence. Longer-range context is taken into account to learn the signaled rhetorical relations. The labels obtained from a discourse connective classifier are then integrated into statistical translation models (from EN to FR and DE). Linguistic annotation and evaluation issues are discussed together with results from automated scoring. Discourse connectives are furthermore especially prone to translationese and translated texts show either increased use or decreased use of discourse markers. Work in progress aims to capture this natural explicitation and implicitation of discourse connectives in current statistical machine translation models.
منابع مشابه
Feature Weight Optimization for Discourse-Level SMT
We present an approach to feature weight optimization for document-level decoding. This is an essential task for enabling future development of discourse-level statistical machine translation, as it allows easy integration of discourse features in the decoding process. We extend the framework of sentence-level feature weight optimization to the document-level. We show experimentally that we can...
متن کاملDocent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation
We describe Docent, an open-source decoder for statistical machine translation that breaks with the usual sentence-bysentence paradigm and translates complete documents as units. By taking translation to the document level, our decoder can handle feature models with arbitrary discourse-wide dependencies and constitutes an essential infrastructure component in the quest for discourse-aware SMT
متن کاملUsing Sense-labeled Discourse Connectives for Statistical Machine Translation
This article shows how the automatic disambiguation of discourse connectives can improve Statistical Machine Translation (SMT) from English to French. Connectives are firstly disambiguated in terms of the discourse relation they signal between segments. Several classifiers trained using syntactic and semantic features reach stateof-the-art performance, with F1 scores of 0.6 to 0.8 over thirteen...
متن کاملDiscourse-level Annotation over Europarl for Machine Translation: Connectives and Pronouns
This paper describes methods and results for the annotation of two discourse-level phenomena, connectives and pronouns, over a multilingual parallel corpus. Excerpts from Europarl in English and French have been annotated with disambiguation information for connectives and pronouns, for about 3600 tokens. This data is then used in several ways: for cross-linguistic studies, for training automat...
متن کاملLexical Chains meet Word Embeddings in Document-level Statistical Machine Translation
The phrase-based Statistical Machine Translation (SMT) approach deals with sentences in isolation, making it difficult to consider discourse context in translation. This poses a challenge for ambiguous words that need discourse knowledge to be correctly translated. We propose a method that benefits from the semantic similarity in lexical chains to improve SMT output by integrating it in a docum...
متن کامل